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^ . Abstract 



We refine a protein model that reproduces fundamental aspects of protein 
thermodynamics. The model exhibits two transitions, hot and cold unfolding. 
j-^ \ The number of relevant parameters is reduced to three: 1) binding energy 

f^ . of folding relative to the orientational energy of bound water, 2) ratio of 

("^ ' degrees of freedom between the folded and unfolded protein chain and 3) 

C^ , the number of water molecules that can access the hydrophobic parts of the 

protein interior. By increasing the number of water molecules in the model, 
C^ ' the separation between the two peaks in the heat capacity curve comes closer, 

which is more consistent with experimental data. In the end we show that if 
we, as a speculative assumption, assign only two distinct energy levels for the 
' ^ \ bound water molecules, we obtain better correspondence with experiments. 

O '. PACS: 05.70.Ce, 87.14.Ee, 87.15. Cc 

O 

-^ . 1 Introduction 

X, 

H , Proteins are crucial components in all living organisms. In order to have biological 

functionality at physiological temperatures it is important that they have an ex- 
clusively ordered state, termed the native state. Anfinsen showed that the native 
state is genetically determined 0] , which means that each protein, with its specific 
amino acid sequence, folds into an unique conformation. The experiment by An- 
finsen also proved that the native state is thermodynamically determined, i.e. the 
state in which Gibbs free energy of the whole system is lowest. It is now commonly 
accepted that folding of the polypeptide chain is thermodynamically driven . 

A peculiar feature of proteins is that they fold on time scales from lO"'^ s to 
1 s. If one calculates the folding time of this process simply by taking the folding 
as stochastic, one finds astronomical time scales [||. This is called the "Levinthal 
paradox" . A resolution of this apparent paradox is outlined in a recent review by 
Shakhnovich B where he discribes how the protein forms at first a "nucleation- 
condensate" [^ |^ via thermal fluctuations of the polypeptide chain, whereupon a 
transition state (TS) occurs, carrying common features to the native state, in which 
the protein descends downhill in the Gibbs free energy landscape to the native state. 
The recent point of view is that the "TS-pathway" is not a concrete mechanistical 
pathway, on which every position corresponds to a unique conformation. Instead 
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a "statistical pathway" is introduced, where a new step forward on the pathway 
means reaching a more favorable statistical ensemble of conformations with regard 
to Gibbs free energy. However, every step along the path, each describing an en- 
semble of conformations, should have some common structural features which acts 
like checkpoints for the folding. Further these checkpoints of increasing order is 
likely to follow a folding pathway M, @. 0, RO, ni| , where one particular point on the 
pathway depends on the assumption that the main structures of the earlier steps 
are conserved. 

Unfolding of the polypeptide chain by increasing the temperature is somewhat 
intuitive, but what is rather surprising is that proteins unfolds at low tempera- 
tures, i.e. they become denaturated and not biological functional. Cold denatura- 
tion seems to be a general property of small globular proteins ]1^, |l^ . 

The paper is organized as follows. In Sec. ^ we present the model and calculate 
the partition function. In Sec.|3| we discuss the thermodynamics of the model, and 
present results for folding and unfolding transitions. 

2 The physical model 

2.1 Polypeptide chain 

We refine a physical model for a small globular protein, which builds on earlier 
models by Hansen et al. [g|, |^ and Bakk et al. |jll| . The protein is viewed as a zipper 
(Fig.|l|), in analogy to the model of Dill et al. [|4|, which is a 1-dimcnsional model of 
a folding pathway. The complex 3-dimensional protein is equipped with N contact 
points which we here call nodes. Each individual node is assigned an energy of 
—Co < if it is folded (native), zero otherwise |1^, |l^. This means that a folded 
node is energetically favorable. Requiring that if node i is folded, all nodes j < i are 
also folded, is an implication of the pathway. The point of view that the nodes are 
distinct contact points in space is a simplification. Folding node 1 means finding 
the "nucleation-condensate" , which is reached through a condensation down to a 
structure which marks the beginning of the folding pathway, and guides the protein 
into the native state. Each individual node is regarded as statistical ensembles due 
to the previous discussion in Section y, and they are likely to form non-local contacts 
which may be important for the cooperativity M, H^, 08] . However, the specific nodes 
do have some common structural motifs. What the specific mechanism forming this 
"nucleation-condensate" is not considered in this paper, but we assume that the 
condensate exists and restrict the study to the TS-pathway, that eventually folds 
the protein into its native conformation. 

We introduce binary contact variables (pi G {0, 1}. 0i = means that node i 
is open (unfolded), and (pi = 1 means that node i is folded. Assuming N nodes, 
a Hamiltonian {Hi) for the energies associated to the polypeptide chain is in a 
compact way written [|| ||, |l^, |ll[ 



-ffl == -eo(01 +01'/'2 H 1- 0102 • • • 0Ar) • (1) 

Product terms (0i ■ ■ ■ pi) meet the assumption about a folding pathway, because if 
pi = 0, all terms containing j > i vanish. 

The unfolded protein will access some more degrees of freedom relative to the 
native protein, because an unfolded polypeptide backbone will have rotational free- 
dom represented by the dihedral angles |19| . This can be further simplified to one 
"pseudodihedral" angle ^^ , and is incorporated by assigning each single unfolded 
node / degrees of freedom. The parameter / is interpreted as the relative increase 
in the degrees of freedom for an unfolded node compared to a folded node. 



2.2 Water interactions 

Introduction of water is important for several reasons. First, proteins is in vivo ex- 
posed to water and second, water has several peculiar properties due to the polarity 
of, and the hydrogen bonds between water molecules. Makhatadze and Privalovg 
states that in sum hydration effects destabilize the native state, and decreasing tem- 
perature implies increasing destabilizing action. This is termed as the "hydrophobic 
force" , and the water-protein interaction is incorporated by an energy ladder rep- 
resenting each individual water molecule associated to the unfolded parts of the 
protein (i.e. all nodes where (/)» = 0) |q, |9| pj 



-e^ + 25 (2) 

LOij is the energy for water molecule j at node i. e^, > 0. Interactions between the 
water molecules are not considered in this paper. Eq. ^is interpreted as all available 
energies for water molecule associated to the unfolded node i. Here we will let M 
water molecules be associated to each unfolded node, whereas Hansen et al. ||, g| 
and Bakk et al. [|ll| restricted this number to one. No water is supposed to access 
a folded node, i.e. the protein interior. 

The ladder contains g equidistant energies which give an entropy contribution 
while node i is folded, because then the water is unbounded. Hence a folded node 
implies an entropy contribution from g^'^ degrees of freedom. The ladder is of course 
a simplification, and is connected to the need of some sort of energy levels to make 
it energerically favorable to unfold at low temperatures. Thus the energy ladder 
in Eq.g is introduced for computational convenience. We note that the proposed 
energy ladder in fact is nothing but the quantized energy levels of a magnetic dipole 
in an external magnetic field. However, in the limit t; — > oo (with g5 finite), the 
classical limit for a magnetic moment of a fixed length is obtained. This in turn is 
equivalent to an electric dipole in an electric field. The latter can be interpreted 
as a direct physical model of dipolar water molecules that feel an effective electric 
field from the protein. In a protein, an electrical field arises from the permanent 
and induced charges on the protein surface that becomes exposed after unfolding 
of a node. This field will interact with the nearest water molecules (dipoles) and 
structure them. The quantitative aspects of the folding problem will probably need 
a discussion of additional interactions, but this will not be considered here. Fig.|l| is 
a schematic illustration of a partly folded protein containing some water associated 
to the hydrophobic parts that uncovers upon unfolding of the nodes. 

By using the same notation as in Eq.||, the energy associated to water-protein 
interactions if 2, becomes 



H2 =(1 - 0l)(tJll -I-W12 H |-Wim) + (1 -0102)(W21 +tJ22 H |-W2Af) , , 

-I h (1 - 4>l4>2 ■ ■ ■ 4>n){^N1 + UJN2 -\ h UJnm) ■ 



2.3 The partition function 

The Hamiltonian H = Hi + H2 describing the entire system is then 
H = - eo(0i + 0102 H 1- 0102 • • • 0^) 

+ (1 -0l)(^ll +W12 H ht^iAf) + (1 -0102)(^21 +t^22H K^2A/) (4) 

H 1- (1 - 0102 • • • 4>n){^ni + ^N2 H h ujnm) ■ 

The partition function Z = X]i=o ^i ' where the term Zi corresponds to folding of 
ah nodes < i (pathway assumption), becomes 

Z. = r-g^^^e-^(^e-^l-^j . (5) 

/3 = 1/T is a rescaled inverse absolute temperature where the Boltzmann constant is 
absorbed in T. Zq means that all nodes are open, i.e. a complete unfolded protein. 
The factor /^^' in Eq. g arises from the degrees of freedom in the polypeptide 
chain that are available in the N — i unfolded nodes. Further the product term 5'*^ 
is the entropy deliberated from M free non-interacting water molecules associated 
to i folded nodes, e*^"'' is the Boltzmann factor from i contact energies — eo in the 
polypeptide chain. The last term in brackets is simply the sum over all distinct levels 
in one water-ladder raised to the power of the number of water molecules (N — i)M, 
bounded to the unfolded hydrophobic parts of the protein. A rearrangement of Eq. || 
gives 

Z, = {g''e^"^fr^~^ , (6) 

where we have defined 



. g(eo/M-£„)/3 



fl/M " l_ ^-gSf3 



(7) 



We put or assume that (5/3 <C 1 (i.e. g — > 00), which means an infinite small level 
spacing in the water ladder. Hence a Taylor expansion yields 1 — e^ ^ ~ 5(3 and 
Eq. M can be rewritten into 



ae 






sinh /3 



(8) 



a = 1//^/^^ and the inverse temperature is rescaled by gSP/2 — > /3. The parameter 
a reflects the ratio of the degrees of freedom between the folded and the unfolded 
units of protein chain. The new energy parameter /i = (eg/Af — Ew + gS/2) / {gS/2) 
is proportional to the binding energy of each node, and may be interpreted as an ef- 
fective chemical potential for each single protein. Changing the environments of the 
protein, i.e. adding denaturants or changing pH, changes this chemical potential. 
We calculate the partition function by simply summing up the Zi terms in Eq. |6| 

z = J:^^ = g""e-^;^^.' ^ g-^U^^ /_^_, , (9) 

4=0 i=0 

where c = 2iVeo / g5. 

The order parameter ("reaction coordinate") in this system is n, which is the 
degree of folding, i.e. the mean of the number of folded nodes divided by N. 

Elo^Z, 1 d r Nr^+^-{N+l)r + l 



3 Thermodynamical calculations and discussion 

3.1 Continuum limit of the water energy levels 

The heat capacity is C — (3'^ ■ d"^ {hi Z) / d 0^ . This function is independent of the 
prefactor 5^^^ e'^^ in Z . Furthermore, Z contains the function r, which has only 
three parameters; the ampUtude factor a, the effective chemical potential /i and the 
number of water molecules per unfolded node M . We assume that the number of 
nodes is a constant, let us say N — 100, reflecting a typical number of residues in 
a small protein. The number of relevant parameters in our physical model is now 
reduced from the initial six: f, g,eo-,Sw,S and M, to only three parameters: a,/Lt 
and M. 

The partition function in Eq.^, and thus the heat capacity C, is apparently most 
sensitive to changes in r for values r « 1. The function r is plotted in Fig. for 
a — 0.5 and M — 1. We see the effect of an decreasing effective chemical potential, 
by the decreasing separation of the two intersections for r = 1 . Larger M implies 
only a smaller and higher function r, while the intersections for ?' = 1 is independent 
of the specific value of M. fic « 0.63 is a critical effective chemical potential, and 
fi < ^c makes the protein denaturated at all temperatures. This critical point was 
studied for A/ = 1 in Ref §. 

The heat capacity C{T) in Fig.y shows two characteristic peaks. Calculating 
the order parameter n, reveals that the protein is essentially unfolded in the hot 
and cold temperature regions. This is notable, because as earlier mentioned hot 
and cold unfolding is a common feature of small globular proteins. It makes sense 
that the protein is unfolded at low temperatures because this is a question of en- 
ergy minimizing. Increasing temperature implies folding, regarded as a compromise 
between entropy and energy. Further increase in temperature shakes the protein, 
whereupon it eventually unfolds, i.e. the residual entropy of the polypeptide chain 
dominates in the Gibbs free energy. It is interesting to note that the temperatures 
for the intersection r = 1 for in Fig.g corresponds to the transition temperatures 
for the heat capacity in Fig.^ for M — 20. The heat capacity for M = 1 is some- 
what smeared out, implying a slightly broader separation between the cold and hot 
unfolding peaks. 

Although the temperature in our model is rescaled it may be important that the 
relative difference between the tops in the heat capacity: 

(T(top 2) — r(top 1)) / T(top 2) corresponds to experimental data, where a typical 
value is 0.1 — 0.3 depending on the chemical potential jl2|. In order to make the 
separation between the peaks smaller in our model, we can either decrease ^ or a, 
or decrease both ^ and a. In Fig. ^ the value of ^ — 0.635 is slightly decreased com- 
pared to Fig. ^ where /i = 0.65. Obviously this results in a smaller peak separation. 
The order parameter n in Fig.|g shows that for M = 1 the protein is only partly 
folded between the two transition temperatures, while for M = 20 the protein is 
nearly completely folded. This fact suggests that for a fixed system size TV, several 
water molecules per unfolded node (M ^ 1) is important in order to get a more 
realistic separation between the two peaks in the heat capacity. 

3.2 Two level water interaction energy 

Finally in this paper we will discuss the case g = 2 for the function r in Eq.^. 
This corresponds to an Ising spin model |Q with only two energy states per water 
molecule. A rearranged version of r then becomes 



a e 



M 



cosh/3 



(11) 



where a = gjj^l^^ and /i = (eo/M-e^+,5/2) / ((5/2). In Fig.|, based on r in Eq.|ri|, 
one sees that the warm top is higher than the cold top, which is the opposite of 
the situation in Figs.0 andft This first feature corresponds better to experimental 
results from Privalov et al. u2, El|. Experiments show that, for the warm unfolding 
transition, the heat capacity of the unfolded state is higher than for the folded state, 
and it has an upward slope that decreases with increasing temperature ||, g2|, |2^ , 
with which Fig.|| is consistent in a qualitative way. 

Although this two-level representation of water molecules gives results with in- 
teresting features, it is not a proper representation of water. But it can give a clue 
to a better physical model of the system, leading to the same features of interest. 

4 Conclusion 

We have in this paper refined the protein model proposed in Refs. & 0, O by 
increasing the number of water molecules that can access the hydrophobic interior of 
the protein. The refined model exhibits both the hot and cold unfolding transitions. 
We have demonstrated how the model only contains three effective parameters, 1) 
binding energy of folding relative to the oricntational energy of bound water, 2) ratio 
of degrees of freedom between folded and unfolded protein chain and 3) the number 
of water molecules that can access the hydrophobic parts of the protein interior. 
By increasing the number of water molecules, we have shown that the separation 
between the hot and cold unfolding transition peaks in the heat capacity curve 
comes closer in comparison to the earlier protein models. This is more consistent 
with the experimental data. By assuming the water-protein interactions to be two 
level, which is a speculative assumption, the heat capacity peak corresponding to 
the cold transition becomes smaller than the heat capacity peak corresponding to 
the hot transition. This is in agreement with experimental data, and opposite to 
the situation found in the earlier protein models of Refs. |g|, ||, pT[ . 
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Figure captions 

Fig. 1. Schematic illustration of a partly folded protein containing i folded nodes 
and N — i unfolded nodes associated with water (shadowed) . 

Fig. 2. The function r{T) in Eq.H for a variable effective chemical potential n, 
a = 0.5 and M = 1. 

Fig. 3. Heat capacity C{T) for M = 1 (scaled by a factor 50) and M = 10 showing 
two characteristic peaks for cold and hot unfolding, a — 0.5 and /i = 0.65. 

Fig. 4. Heat capacity C{T) for M = 1 (scaled by a factor 50) and M = 20. a = 0.5 
and /i = 0.635. 

Fig. 5. Order parameter n{T) for a = 0.5 and /i = 0.635. 

Fig. 6. Heat capacity C(T) for a = 0.48,// = 0.65 and M = 20. This plot is based 
on the function r in Eg. |l l|. 
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Figure 4: 
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Figure 5: 
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Figure 6: 
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